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Cancer has been characterized as a constellation of hundreds of diseases differing in underlying 
mutations and depending on cellular environments. Carcinogenesis as a stochastic physical process 
has been studied for over sixty years, but there is no accepted standard model. We show that 
the hazard rates of all cancers are characterized by a simple dynamic stochastic process on a half¬ 
line, with a universal linear restoring force balancing a universal simple Brownian motion starting 
from a universal initial distribution. Only a critical radius defining the transition from normal to 
tumorigenic genomes distinguishes between different cancer types when time is measured in cell- 
cycle units. Reparametrizing to chronological time units introduces two additional parameters: the 
onset of cellular senescence with age and the time interval over which this cessation in replication 
takes place. This universality implies that there may exist a finite separation between normal cells 
and tumorigenic cells in all tissue types that may be a viable target for both early detection and 
preventive therapy. 

PACS numbers: 87.19.xj, 89.75.Fb, 87.14.gk 


Cancer is considered to be a multifaceted disease where 
the phenotypic similarities of tumor progression are a ve¬ 
neer over a multitude of possible underlying genetic al¬ 
terations jT]. Three-quarters of all cancers are probably 
sporadic. As an organism ages, the accumulation of mu¬ 
tations increases the likelihood of alteration in an onco¬ 
gene or in a tumor suppressor gene, which in turn can 
lead to an accumulation of mutations. The process of 
carcinogenesis has been modeled for over 60 years mni, 
but there is no consensus model. Summaries of the state 
of cancer incidence modeling can be found in Harding, 
Pompei and Wilson m and Beerenwinkel et al. m- 

Recently, Tomasetti and Vogelstein showed that 
the lifetime risk of cancers of many different types is cor¬ 
related with the total number of divisions of the normal 
self-renewing cells, the somatic stem cells, that maintain 
each tissue’s homeostasis. This implies that most cancer 
is due to random mutations arising during DNA repli¬ 
cation in normal, noncancerous somatic stem cells. This 
observation poses an interesting challenge for a mechanis¬ 
tic understanding of cancer incidence. If a somatic stem 
cell can become cancerous at any time during a lifetime, 
any valid model of carcinogenesis should be able to use 
this as a basis for age-specific cancer hazard rate pre¬ 
diction and match available age-specific cancer incidence 
rates [TS]. 

Harding et al. m have analyzed the Surveillance, 
Epidemiology and End Results (SEERjTB], specifically 
SEER 9) cancer registries to compile age-specific inci¬ 
dence rates, with particular care accorded to the data on 
the very elderly (ages > 80 years). They noted that inci¬ 
dence for most cancers reached a maximum between ages 
75 years and 90 years, with a precipitous decline later, 
and often tended to vanish among centenarians. This de¬ 
cline is difficult to explain in stem cell models that assume 
that any stem cell will eventually produce a tumor |13| . 
With exceptions, for example, the beta model |5] and the 


generalized beta model [H], such models project increas¬ 
ing cancer rates throughout adulthood. 

Our aim here is to present a simple universal physical 
model for the stochastic process of tumorigenesis that re¬ 
solves the tension between the spontaneous random ori¬ 
gin of cancer shown in Ref. m and observed age-specific 
incidence rate curves nsiiiHi. While tissues are hetero¬ 
geneous in cellular characteristics, recent work on induc¬ 
ing pluripotency[20] in differentiated cells [19], along with 
work on somatic stem cells with regard to cancer|5T], 
suggests that such stem cells share commonalities. If so, 
the fundamental process of cellular replication, and the 
fidelity of the concomitant information propagation, is 
most likely to be universal. From a physical perspec¬ 
tive, the propagation of information by replication is a 
stochastic process with error correction in the form of 
a multitude of repair mechanisms |25| . We expect, then, 
that a limit on error correction should be universal to all 
cancers in a given species. In other words, irrespective of 
tissue or cell type, we expect that there should be a co¬ 
ordinate, measuring the effective error in the propagated 
genome, that at a critical value marks a sharp transition 
between normal and tumor cells. If there is no such sharp 
transition, the utility and feasibility of early detection of 
cancer is called into question. Here, we show that that 
there is such a coordinate, and a universal diffusion pro¬ 
cess for the position of each stem cell on this coordinate, 
such that age-specific cancer hazard rates are determined 
by the probability of crossing a cancer-type-independent 
error limit. Diffusion processes have been studied in this 
connection|26|. but the focus was on the accumulation 
of mutations in the pre-cancer phase. The role of error 
correction in computing cancer incidence rates has not 
been investigated. 

We set out to compare possible diffusive processes for 
different tissues. Although DNA mutations and epige¬ 
netic changes can be introduced into a non-mitotic cell. 
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(e.g. through a viral insertion or the action of retro- 
transposons), we focus on the errors that accumulate dur¬ 
ing cell division (i.e. S phase). The human genome has 
3 X 10® base pairs that can mutate but not all genome 
positions contribute equally to cancer susceptibility. We 
introduce an effective error coordinate, r, parameterizing 
an appropriately weighted mean alteration distance in 
genome space, including both mutations and epigenetic 
changes, such as methylation changes that affect DNA 
repair genes themselves. We hypothesized that cancer 
occurs when this error coordinate exceeds a critical value, 
rc- 

The cell constantly expends energy for DNA error cor¬ 
rection, which acts as a restoring force to oppose mu¬ 
tational diffusion (25]. We choose a restoring force that 
scales linearly with error coordinate. The error coor¬ 
dinate therefore obeys an Ornstein-Uhlenbeck stochastic 
process (^ on the half line. There are two physical moti¬ 
vations for this choice of restoring force: First, the error 
correction response is then tolerant of infinitesimal ex¬ 
cursions away from the original genome, and second, a 
discrete urn model of mutating bases has this process as 
a scaling limit. 

The probability density for r obeys the Fokker-Planck 
equation 


drp{r, t) = Tg ^9r-rp(r, r) -|- d^p{r, r) (1) 

with a reflecting boundary condition at the origin (i.e. 
drp{r = 0 , t ) = 0). This equation can be analytically 
solved. We allow for the possibility of mitotic mutations 
during prenatal development by parameterizing the ini¬ 
tial density as n(r, r = 0) (x exp(—Q!r®/2rg), where a is 
dimensionless and a = oo for an error free initial genome. 
Given that we only consider error accumulation during 
mitosis, the relevant time scale is measured in terms of 
the number of cell divisions, which may not be constant 
in time. In fact, it is well known that cells become senes¬ 
cent with age. Hence, to relate r to chronological time t, 
we introduce a reparameterization 

dr = dt D{t) = dt 0.5 -|- tanh ^ , (2) 

where Tg is a cell-type specific senescence time, and Wg 
a cell-type specific senescence time uncertainty, moti¬ 
vated by Ref. m- The probability of having become 
cancerous is given by p{r > rc) = p{s,t)ds. The can¬ 
cer hazard rate is then the time derivative of the odds 
p{r > rc)/p{r < rc): 


_ d /“ ds pis, t) 
dt /J'" ds pis, t) 


( 3 ) 


Hazard rates are better estimated from incidence|18| 
rather than mortality data|15| . 


The model is specified by the parameters a, rg, Tg, 
Wg and rc- Our goal is to fit the predicted cancer haz¬ 
ard rate to the observed incidence rates for various can¬ 
cer types[TH]. However, since we are free to choose a 
scale to measure errors, we can fix rg = 1. Using Nelder- 
Mead minimization for parameter determination, we only 
needed to fit the incidence curve /(<) up to a scale fac¬ 
tor since our theory considers the per-stem-cell hazard 
rate while the total hazard rate is this rate summed in¬ 
dependently over all stem cells in the tissue HZ). If there 
are only x susceptible cells in the tissue, then the hazard 
rate for that tissue is x times the rate for a single somatic 
stem cell in that tissue. 

Expressing the cancer-specific values of a as = 
i^e /found that the best fit Xc and Tq values for 
all cancers depended on cancer type weakly (Table 1). 
The average Xc value was 2.6 and the mean width of 
the initial distribution was Xa = 0.22. The Bayes In¬ 
formation Criterion (BIG) can be used to compare mod¬ 
els taking model complexity into account. The BIG for 
a model with r^ = 0.16 fixed for all tumor types was 
BICq, = 1.75x 10^ while that for models with cancer-type 
specific initial distributions was BICo = 2.092x lOU Some 
examples are shown in Fig. 1. Thus, notwithstanding the 
fact that the initial distribution is tissue and cell-type- 
specific, as proliferation during development is heteroge¬ 
neous and different tissues undergo tissue-specific repli¬ 
cation and apoptosis cycles, the model selected by the 
BIG is the one with a universal initial distribution. Of 
course, cell proliferation during growth and development 
in children and adults are controlled differently, and our 
results apply only to tumorigenesis in adults. With two 
exceptions (a subtype of Hodgkin lymphoma and testic¬ 
ular cancer), the senescence time was very long, on the 
order of 90 years. For these two exceptions, the time 
interval, Wg, over which the somatic stem cells in these 
tissues stop dividing was much longer. 

We tried to simplify the model further by fixing Xc 
and Tq, for all tumor types and the resulting model with 
fixed Xc = 4.4 and Xa = 0.26 reached a BIG value of 
BICq,,c = 2.087 X 10^. This value is slightly less than 
BICq for the model with Xc,Xa both optimized for each 
tumor type. Thus, surprisingly, we find that genomic dif¬ 
fusion, error correction, the initial error distribution, and 
the critical error radius, are largely universal factors with 
regards to the incidence of cancer in somatic stem cells 
in all tissues. The only factors that are mostly cancer 
specific are the senescence age and the time interval over 
which senescence occurs. With a finer categorization of 
cancer types, and more precise knowledge of the num¬ 
ber of somatic stem cells relevant for each cancer type, 
it should be possible to determine parameters more pre¬ 
cisely by including the scale of the coordinate distribu¬ 
tion in the optimization, as Xc and Xa are obviously not 
completely independent. 

The inability of some previous models |7] to match in- 
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FIG. 1. Cancer incidence rates. Models with all parame¬ 
ters fitted (Full) and those with = 0.16 are almost indis¬ 
tinguishable. Incidence is measured per 10® person-years at 
risk. 


cidence rates that peak and then decrease led Ref. [15] to 
suggest some possible resolutions. The resolution embod¬ 
ied in our model is that there is a DNA error-correcting 
process so that every stem cell does not, in fact, pro¬ 
ceed to carcinogenesis given enough time, and that senes¬ 
cence correlates the reduction in tumorigenesis with the 
reduction in mitosis. Indeed, Ref. [15] explicitly argued 
that tissue and cellular senescence are the likely biological 
mechanisms for the observed drop off in cancer incidence 
in the very elderly. The beta model [H] is also consistent 
with the latter part of the resolution, but does not have 
a dynamic basis nor does it posit a role for DNA repair. 

DNA repair involves a multitude of different proteins 


in distinct pathways [53]. It is remarkable that the sum 
total end result of all these repair mechanisms attempt¬ 
ing to correct random mutations is apparently a linear 
stochastic restoring force. Apart from a constant force, 
this is the simplest possible functional form. We spec¬ 
ulate that a constant restoring force would require an 
inordinate energetic effort for even innocuous mutations, 
whereas a linear restoring force allows a graduated es¬ 
calation in repair effort based on the effective error co¬ 
ordinate. Tolerance to low levels of mutation has also 
been suggested to be necessary for evolution [57]. This 
seems to support our linear restoring force, but with the 
caveat that cancers mostly arise from somatic stem cells 
whereas evolution is due to germline mutations and the 
error correction mechanisms need not be the same for 
the two cell types. However, it may be the case that a 
linear restoring force confers higher fitness in both cases 
although for different reasons. 

In conclusion, the simple physical understanding we 
have presented here suggests that the space of mutational 
histories has a natural diffusion away from the initial 
starting distribution, restrained by a universal error cor¬ 
rection, and starting from a universal initial distribution. 
The incidence rate for all cancers is the rate of moving be¬ 
yond a threshold that depends weakly on tumor type. A 
relatively sharp demarcation between tumors and normal 
cells is a concrete prediction of our model of tumorige¬ 
nesis. This universality suggests that reducing the inci¬ 
dence of sporadic cancers requires enhancing mechanisms 
that maintain the fidelity of DNA replication in somatic 
stem cells throughout a lifetime. While this is a facile ob¬ 
servation, the universality we found suggests that there 
are not a multitude of strategies required on a tissue- 
by-tissue basis. More importantly, there is an interval 
between rg = 1 and ~ 2.6 for almost all cancers where 
the mutated somatic stem cell genome has not yet be¬ 
come tumorigenic, and yet may be distinguishable from 
a normal stem cell, since the initial genome distribution 
has a width « 0.2. Detecting somatic stem cells in 
this interval early, and then targeting therapies towards 
ablating them, is a possible approach to reducing the inci¬ 
dence of cancer. The universality we have found provides 
a measure of hope that there may be tissue-independent 
commonalities in both detection and therapy that could 
prevent metastases. 
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TABLE I. Parameters for various cancers (NS, Nervous Sys¬ 
tem; (non-)Hodgkm, (non-)Hodgkin Lymphoma) 


Tissue 

Sex 

BIC 

Ts{yrs) 

Ws{yrs) 

Ta 

Tc 

All 

M 

418 

93.7 

14.2 

0.119 

2.68 

Brain & NS 

M 

45.6 

90.3 

7.15 

0.259 

2.5 

Breast 

M 

67.7 

97.7 

0.515 

0.151 

2.5 

Colon 

M 

178 

98 

7.32 

0.247 

2.88 

Esophagus 

M 

29.4 

92.8 

14.2 

0.1 

2.59 

Hodgkinl 

M 

14.4 

6.18e-08 

32.8 

0.159 

1.19 

Hodgkin2 

M 

129 

90.2 

12.7 

0.1 

2.25 

Kidney 

M 

le-t-04 

92.3 

13 

0.1 

2.38 

Larynx 

M 

110 

80.6 

16.3 

0.166 

2.86 

Leukemia 

M 

342 

103 

2.7 

0.269 

2.83 

Liver 

M 

290 

94.6 

9.4 

0.199 

2.34 

Lung 

M 

120 

88.5 

14.8 

0.181 

3.02 

Melanoma 

M 

150 

97.5 

5.01 

0.229 

2.29 

Mesothelioma 

M 

50.2 

87.4 

7.33 

0.285 

3.97 

Misc 

M 

80.3 

99.5 

3.93 

0.219 

3.08 

Myeloma 

M 

18.5 

96 

9.22 

0.183 

2.65 

Non-Hodgkin 

M 

76.2 

97.1 

9.67 

0.194 

2.57 

Oral 

M 

501 

98.5 

7.86 

0.1 

1.98 

Pancreas 

M 

22.8 

96.9 

7.52 

0.261 

2.89 

Prostate 

M 

2.48e+03 

74.7 

22.9 

0.1 

3.51 

Stomach 

M 

33.5 

99.5 

5.95 

0.256 

2.88 

Testis 

M 

33.4 

28.9 

20.1 

0.135 

1.59 

Thyroid 

M 

23.9 

87.2 

12.2 

0.242 

1.9 

Urinary 

M 

2.68e+03 

97.2 

9.04 

0.182 

2.89 

All 

F 

2.01e+03 

99.9 

13.4 

0.197 

2.4 

Brain & NS 

F 

38.1 

92.4 

6.85 

0.327 

2.5 

Breast 

F 

1.64e+03 

91 

30.3 

0.1 

2.2 

Cervix Uteri 

F 

204 

96.7 

7.66 

0.103 

1.33 

Colon 

F 

51.6 

100 

10.4 

0.228 

2.91 

Corpus Uteri 

F 

516 

81.1 

19.1 

0.1 

2.43 

Esophagus 

F 

18.4 

99.3 

5.79 

0.282 

2.87 

Hodgkinl 

F 

13.4 

3.12e-09 

22.4 

0.142 

1.37 

Hodgkin2 

F 

124 

93 

12.1 

0.1 

2.19 

Kidney 

F 

50.9 

93 

11 

0.214 

2.46 

Leukemia 

F 

192 

102 

4.63 

0.308 

2.9 

Lung 

F 

36.8 

82.7 

14 

0.12 

3.08 

Misc 

F 

57.2 

107 

5.01 

0.428 

3.5 

Non-Hodgkin 

F 

113 

97.3 

9.88 

0.222 

2.65 

Larynx 

F 

18.6 

71.9 

16.7 

0.365 

3.91 

Liver 

F 

24.9 

92 

12.8 

0.1 

2.69 

Melanoma 

F 

19.8 

102 

3.19 

0.226 

1.7 

Mesothelioma 

F 

15.6 

90 

4.51 

0.187 

3.04 

Myeloma 

F 

49.8 

92.3 

7.35 

0.275 

2.99 

Oral 

F 

32.4 

99.4 

5.52 

0.235 

2.39 

Ovary 

F 

269 

96.6 

9.21 

0.211 

2.27 

Pancreas 

F 

24.3 

103 

14 

0.1 

2.77 

Stomach 

F 

140 

106 

3.48 

0.173 

2.72 

Thyroid 

F 

20.8 

80.7 

23.6 

0.113 

1.28 

Urinary 

F 

45.2 

98.8 

5.6 

0.251 

2.72 




